Comparison of local classifiers for cross-project defect prediction

نویسندگان

  • Alexander Trautsch
  • Steffen Herbold
چکیده

There is a connection between static source code metrics, for example, lines of code or cyclomatic complexity and potential defects in the source code. Obviously, there is no closed formula, but with the field of machine learning and its techniques we have a tool at our disposal that has the ability to infer rules from large amounts of data. In this thesis, we use machine learning techniques to predict defects in software components from source code metrics. We use a subset of a publicly available dataset from open source projects to train and evaluate our machine learning models. As we do cross-project defect prediction, we predict defects in a software project using classifiers trained with the source code metrics of other projects. In this thesis, we evaluate the cross-project defect prediction performance of two local classifier implementations. Local classifiers partition the training data first, then train a classifier for each partition. This leads to more homogeneous data and should yield better prediction results than training one global classifier with the complete data. We run six experiments which compare our two local classifier implementations with an existing global classifier, which already yields a high success rate in cross-project defect prediction. These experiments are conducted using the CrossPare framework for ease of implementation and comparability of results. The experiments make use of two different methods of bias handling: equal weighting and undersampling. This will additionally show how these techniques influence our local classifiers and the prediction performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Simplification of Training Data for Cross-Project Defect Prediction

Cross-project defect prediction (CPDP) plays an important role in estimating the most likely defect-prone software components, especially for new or inactive projects. To the best of our knowledge, few prior studies provide explicit guidelines on how to select suitable training data of quality from a large number of public software repositories. In this paper, we have proposed a training data s...

متن کامل

An empirical study on software defect prediction with a simplified metric set

Context: Software defect prediction plays a crucial role in estimating the most defect-prone components of software, and a large number of studies have pursued improving prediction accuracy within a project or across projects. However, the rules for making an appropriate decision between withinand cross-project defect prediction when available historical data are insufficient remain unclear. Ob...

متن کامل

Metaheuristic Optimization based Feature Selection for Software Defect Prediction

Software defect prediction has been an important research topic in the software engineering field, especially to solve the inefficiency and ineffectiveness of existing industrial approach of software testing and reviews. The software defect prediction performance decreases significantly because the data set contains noisy attributes and class imbalance. Feature selection is generally used in ma...

متن کامل

A Systematic Study of Cross-Project Defect Prediction With Meta-Learning

The prediction of defects in a target project based on data from external projects is called Cross-Project Defect Prediction (CPDP). Several methods have been proposed to improve the predictive performance of CPDP models. However, there is a lack of comparison among state-of-the-art methods. Moreover, previous work has shown that the most suitable method for a project can vary according to the ...

متن کامل

ECLogger: Cross-Project Catch-Block Logging Prediction Using Ensemble of Classifiers

Background: Software developers insert log statements in the source code to record program execution information. However, optimizing the number of log statements in the source code is challenging. Machine learning based within-project logging prediction tools, proposed in previous studies, may not be suitable for new or small software projects. As these software projects do not have sufficient...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014